Re: Mac OS X: system shutdown prevents checkpoint - Mailing list pgsql-hackers
From | Peter Bierman |
---|---|
Subject | Re: Mac OS X: system shutdown prevents checkpoint |
Date | |
Msg-id | v03130303b8f611e5d925@[17.202.21.231] Whole thread Raw |
In response to | Mac OS X: system shutdown prevents checkpoint (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Mac OS X: system shutdown prevents checkpoint
|
List | pgsql-hackers |
At 1:26 AM -0400 4/30/02, Tom Lane wrote: >I've been looking into Francois Suter's recent reports of Postgres not >shutting down cleanly on Mac OS X 10.1. > >Now here's what I see in the case of shutting down the OS X system: > >2002-04-30 00:25:35 [376] DEBUG: pmdie 15 >2002-04-30 00:25:35 [376] DEBUG: smart shutdown request > >... and nothing more. Actual system shutdown (power down) occurred at >approximately 00:26:06 by my watch, over thirty seconds later than the >postmaster received SIGTERM. So there was plenty of time to do the >checkpoint subprocess. (Indeed, I believe that thirty seconds is the >grace period Darwin's init process allows SIGTERM'd processes before >giving up and hard-killing them. So the system was actually sitting and >waiting for the postmaster.) > >What we appear to have here is that the kernel is not allowing the >postmaster to fork a checkpoint subprocess. But there's no indication >that the postmaster got a fork() error return, either. Seems like it's >just hung. Unfortunately, I don't have time right now to look into this myself, and because I just moved, I don't have a machine I cangive someone an account on to try it themselves (PacBell says 20 days for DSL xfer). But I asked around, and got a pairof tips from the Mac OS X Core OS group. If you want to converse with either of the people named below, they're bothactive on the darwin-development mailing list. (http://lists.apple.com/mailman/listinfo/darwin-development) -pmb At 1:52 PM -0700 5/1/02, Jim Magee wrote: >On Wednesday, May 1, 2002, at 01:34 PM, Peter Bierman wrote: > >> Is fork() disallowed after shutdown starts? > >No, it's allowed. But, depending upon timing, the new process may be >hammered with a SIGTERM right away (maybe even before main()). It is >always very tricky to fork() as the result of a daemon getting a signal. >They are often process group leader, and so their children may get the >same signal they just got. > >POSIX is very ambiguous on whether a new process in the group should also >get the signal while we're still delivering them, or whether it shouldn't >because it wasn't in the group at the time the signal was first >delivered). Both choices have their problems, and so developers have to >deal with either case. Do you have signals masked off correctly before >the fork()/exec()? > >Is fork really returning a PID in the parent, and it just looks like the >child didn't make it to returning from its fork() call? There are some >preparation things that happen in dyld and libc as part of returning fom >fork in the child, and these run before we make it look like fork() >returned in the child. If they encounter an error (maybe because the >services they need to talk to are no longer available), they have nothing >else to do but call _exit() - making it look like the child never returned >from fork(). > >But in either the dydl/libc exit case, or the signal case, the parent >should get a wait result indicating why the child went away so >prematurely. If is was an exit(), maybe using vfork() will yield better >results, as there is no need for child-side setup in the vfork() case. > >--Jim At 2:01 PM -0700 5/1/02, Matt Watson wrote: > >It could be that the child has blocked trying to contact a dead lookupd.
pgsql-hackers by date: